{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training a TensorFlow Multitask Recommender on a SageMaker Cluster"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this tutorial, we build a simple matrix factorization model using the [MovieLens 100K dataset](https://grouplens.org/datasets/movielens/100k/) with TensorFlow Recommender System (TFRS) using Amazon SageMaker. \n",
    "\n",
    "We will use this model to recommend movies for a given user."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q sagemaker==2.9.2\n",
    "!pip install -q sagemaker-experiments==0.1.24\n",
    "!pip install -q tensorflow==2.3.0\n",
    "!pip install -q tensorflow-recommenders==0.2.0\n",
    "!pip install -q tensorflow-datasets==4.0.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "import pandas as pd\n",
    "\n",
    "sess   = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "role = sagemaker.get_execution_role()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name='sagemaker', region_name=region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Specify Input Data S3 URI and `Distribution Strategy`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-east-1-835319576252/tensorflow_datasets/train/', 'S3DataDistributionType': 'ShardedByS3Key'}}}\n"
     ]
    }
   ],
   "source": [
    "from sagemaker.inputs import TrainingInput\n",
    "\n",
    "input_train_data_s3_uri ='s3://{}/tensorflow_datasets/train/'.format(bucket)\n",
    "\n",
    "s3_input_train_data = TrainingInput(s3_data=input_train_data_s3_uri,\n",
    "                                    distribution='ShardedByS3Key')\n",
    "print(s3_input_train_data.config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup Metrics To Track Model Performance\n",
    "\n",
    "These sample log lines...\n",
    "```\n",
    "499/500 [=====>..] - ETA: 3s - root_mean_squared_error: 1.1198 - factorized_top_k/top_10_categorical_accuracy: 0.481 - factorized_top_k/top_50_categorical_accuracy: 0.607 - factorized_top_k/top_100_categorical_accuracy: 0.885\n",
    "```\n",
    "...will produce the following metrics in CloudWatch:\n",
    "\n",
    "`root_mean_squared_error` = 1.1198\n",
    "\n",
    "`factorized_top_k/top_10_categorical_accuracy` = 0.481\n",
    "\n",
    "`factorized_top_k/top_50_categorical_accuracy` = 0.607\n",
    "\n",
    "`factorized_top_k/top_100_categorical_accuracy` = 0.885"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "metrics_definitions = [    \n",
    "     {'Name': 'root_mean_squared_error', 'Regex': 'root_mean_squared_error: ([0-9\\\\.]+)'},\n",
    "     {'Name': 'top_10_categorical_accuracy', 'Regex': 'factorized_top_k/top_10_categorical_accuracy: ([0-9\\\\.]+)'},\n",
    "     {'Name': 'top_50_categorical_accuracy', 'Regex': 'factorized_top_k/top_50_categorical_accuracy: ([0-9\\\\.]+)'},\n",
    "     {'Name': 'top_100_categorical_accuracy', 'Regex': 'factorized_top_k/top_100_categorical_accuracy: ([0-9\\\\.]+)'}\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup Hyper-Parameters for Classification Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "epochs=10\n",
    "learning_rate=0.00003\n",
    "dataset_variant='100k' # movielens 100k, 1m, 20m, 25m, etc\n",
    "embedding_dimension=32 # dimension (k) of our user and item embeddings\n",
    "enable_tensorboard=True\n",
    "train_instance_count=1\n",
    "train_instance_type='ml.p3.2xlarge'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup Our TensorFlow Script to Run on SageMaker\n",
    "Prepare our TensorFlow model to run on the managed SageMaker service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tensorflow import TensorFlow\n",
    "\n",
    "estimator = TensorFlow(entry_point='train_multitask.py',\n",
    "                       source_dir='src',\n",
    "                       role=role,\n",
    "                       instance_count=train_instance_count,\n",
    "                       instance_type=train_instance_type,\n",
    "                       py_version='py37',\n",
    "                       framework_version='2.3.0',\n",
    "                       hyperparameters={\n",
    "                           'epochs': epochs,\n",
    "                           'learning_rate': learning_rate,\n",
    "                           'dataset_variant': dataset_variant,\n",
    "                           'embedding_dimension': embedding_dimension,                           \n",
    "                           'enable_tensorboard': enable_tensorboard\n",
    "                       },\n",
    "                       metric_definitions=metrics_definitions,\n",
    "                       debugger_hook_config=False\n",
    "            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create the Experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Experiment name: MovieLens-Recommender-1606694566\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "from smexperiments.experiment import Experiment\n",
    "\n",
    "timestamp = int(time.time())\n",
    "\n",
    "recommender_experiment = Experiment.create(\n",
    "                         experiment_name='MovieLens-Recommender-{}'.format(timestamp),\n",
    "                         description='MovieLens Recommender', \n",
    "                         sagemaker_boto_client=sm)\n",
    "\n",
    "recommender_experiment_name = recommender_experiment.experiment_name\n",
    "print('Experiment name: {}'.format(recommender_experiment_name))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Trial name: trial-1606694566-10-100k-32\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "from smexperiments.trial import Trial\n",
    "\n",
    "timestamp = int(time.time())\n",
    "\n",
    "trial_name = 'trial-{}-{}-{}-{}'.format(timestamp, epochs, dataset_variant, embedding_dimension)\n",
    "\n",
    "trial = Trial.create(trial_name=trial_name,\n",
    "                     experiment_name=recommender_experiment_name,\n",
    "                     sagemaker_boto_client=sm)\n",
    "\n",
    "trial_name = trial.trial_name\n",
    "print('Trial name: {}'.format(trial_name))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommender_experiment_config = {\n",
    "    'ExperimentName': recommender_experiment_name,\n",
    "    'TrialName': trial.trial_name,\n",
    "    'TrialComponentDisplayName': 'train'\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train the Model on SageMaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:sagemaker:Creating training-job with name: tensorflow-training-2020-11-30-00-02-47-224\n"
     ]
    }
   ],
   "source": [
    "estimator.fit(\n",
    "              inputs={\n",
    "                  'train': s3_input_train_data, \n",
    "              },              \n",
    "              experiment_config=recommender_experiment_config,                   \n",
    "              wait=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training Job Name:  tensorflow-training-2020-11-30-00-02-47-224\n"
     ]
    }
   ],
   "source": [
    "recommender_multitask_training_job_name = estimator.latest_training_job.name\n",
    "print('Training Job Name:  {}'.format(recommender_multitask_training_job_name))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/jobs/tensorflow-training-2020-11-30-00-02-47-224\">Training Job</a></b>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}\">Training Job</a></b>'.format(region, recommender_multitask_training_job_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logStream:group=/aws/sagemaker/TrainingJobs;prefix=tensorflow-training-2020-11-30-00-02-47-224;streamFilter=typeLogStreamPrefix\">CloudWatch Logs</a></b>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/TrainingJobs;prefix={};streamFilter=typeLogStreamPrefix\">CloudWatch Logs</a></b>'.format(region, recommender_multitask_training_job_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>Review <a target=\"blank\" href=\"https://s3.console.aws.amazon.com/s3/buckets/sagemaker-us-east-1-835319576252/tensorflow-training-2020-11-30-00-02-47-224/?region=us-east-1&tab=overview\">S3 Output Data</a> After The Training Job Has Completed</b>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview\">S3 Output Data</a> After The Training Job Has Completed</b>'.format(bucket, recommender_multitask_training_job_name, region)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Wait for Training Job to Finish"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "2020-11-30 00:02:48 Starting - Starting the training job\n",
      "2020-11-30 00:02:52 Starting - Launching requested ML instances.............\n",
      "2020-11-30 00:04:04 Starting - Preparing the instances for training............\n",
      "2020-11-30 00:05:07 Downloading - Downloading input data.............\n",
      "2020-11-30 00:06:17 Training - Downloading the training image.......\n",
      "2020-11-30 00:06:58 Training - Training image download completed. Training in progress.......................................\n",
      "2020-11-30 00:10:13 Uploading - Uploading generated training model\n",
      "2020-11-30 00:10:20 Completed - Training job completed\n",
      "CPU times: user 440 ms, sys: 28.5 ms, total: 469 ms\n",
      "Wall time: 7min 32s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "    \n",
    "estimator.latest_training_job.wait(logs=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Copy the Trained Model from S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "download: s3://sagemaker-us-east-1-835319576252/tensorflow-training-2020-11-30-00-02-47-224/output/model.tar.gz to ./model.tar.gz\n"
     ]
    }
   ],
   "source": [
    "!aws s3 cp s3://$bucket/$recommender_multitask_training_job_name/output/model.tar.gz ./model.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "code/\n",
      "code/inference.py\n",
      "tensorflow/\n",
      "tensorflow/saved_model/\n",
      "tensorflow/saved_model/0/\n",
      "tensorflow/saved_model/0/saved_model.pb\n",
      "tensorflow/saved_model/0/assets/\n",
      "tensorflow/saved_model/0/variables/\n",
      "tensorflow/saved_model/0/variables/variables.data-00000-of-00001\n",
      "tensorflow/saved_model/0/variables/variables.index\n",
      "tensorboard/\n",
      "tensorboard/train/\n",
      "tensorboard/train/events.out.tfevents.1606694851.ip-10-2-103-179.ec2.internal.33.545.v2\n",
      "tensorboard/train/events.out.tfevents.1606694955.ip-10-2-103-179.ec2.internal.33.7808.v2\n",
      "tensorboard/train/plugins/\n",
      "tensorboard/train/plugins/profile/\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.xplane.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.kernel_stats.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.tensorflow_stats.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.overview_page.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.memory_profile.json.gz\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.trace.json.gz\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_09_17/ip-10-2-103-179.ec2.internal.input_pipeline.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.xplane.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.kernel_stats.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.tensorflow_stats.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.overview_page.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.memory_profile.json.gz\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.trace.json.gz\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_07_36/ip-10-2-103-179.ec2.internal.input_pipeline.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.xplane.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.kernel_stats.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.tensorflow_stats.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.overview_page.pb\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.memory_profile.json.gz\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.trace.json.gz\n",
      "tensorboard/train/plugins/profile/2020_11_30_00_08_27/ip-10-2-103-179.ec2.internal.input_pipeline.pb\n",
      "tensorboard/train/events.out.tfevents.1606694856.ip-10-2-103-179.ec2.internal.profile-empty\n",
      "tensorboard/train/events.out.tfevents.1606694905.ip-10-2-103-179.ec2.internal.33.4185.v2\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p ./model/\n",
    "!tar -xvzf ./model.tar.gz -C ./model/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inspect the Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-11-30 00:10:26.902779: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory\n",
      "2020-11-30 00:10:26.903864: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
      "\n",
      "MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n",
      "\n",
      "signature_def['__saved_model_init_op']:\n",
      "  The given SavedModel SignatureDef contains the following input(s):\n",
      "  The given SavedModel SignatureDef contains the following output(s):\n",
      "    outputs['__saved_model_init_op'] tensor_info:\n",
      "        dtype: DT_INVALID\n",
      "        shape: unknown_rank\n",
      "        name: NoOp\n",
      "  Method name is: \n",
      "\n",
      "signature_def['serving_default']:\n",
      "  The given SavedModel SignatureDef contains the following input(s):\n",
      "    inputs['input_1'] tensor_info:\n",
      "        dtype: DT_STRING\n",
      "        shape: (-1)\n",
      "        name: serving_default_input_1:0\n",
      "  The given SavedModel SignatureDef contains the following output(s):\n",
      "    outputs['output_1'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1, 10)\n",
      "        name: StatefulPartitionedCall:0\n",
      "    outputs['output_2'] tensor_info:\n",
      "        dtype: DT_STRING\n",
      "        shape: (-1, 10)\n",
      "        name: StatefulPartitionedCall:1\n",
      "  Method name is: tensorflow/serving/predict\n",
      "2020-11-30 00:10:31.968747: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory\n",
      "2020-11-30 00:10:31.969949: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)\n",
      "2020-11-30 00:10:31.970749: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (datascience-1-0-ml-t3-medium-1abf3407f667f989be9d86559395): /proc/driver/nvidia/version does not exist\n",
      "Traceback (most recent call last):\n",
      "  File \"/opt/conda/bin/saved_model_cli\", line 8, in <module>\n",
      "    sys.exit(main())\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py\", line 1185, in main\n",
      "    args.func(args)\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py\", line 715, in show\n",
      "    _show_all(args.dir)\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py\", line 307, in _show_all\n",
      "    _show_defined_functions(saved_model_dir)\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py\", line 187, in _show_defined_functions\n",
      "    trackable_object = load.load(saved_model_dir)\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py\", line 603, in load\n",
      "    return load_internal(export_dir, tags, options)\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py\", line 633, in load_internal\n",
      "    ckpt_options)\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py\", line 131, in __init__\n",
      "    self._restore_checkpoint()\n",
      "  File \"/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py\", line 358, in _restore_checkpoint\n",
      "    \"%r from the checkpoint.\" % obj))\n",
      "NotImplementedError: Missing functionality to restore state of object <tensorflow.python.saved_model.load._RestoredResource object at 0x7f70d6ec6850> from the checkpoint.\n"
     ]
    }
   ],
   "source": [
    "!saved_model_cli show --all --dir ./model/tensorflow/saved_model/0/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Make a Sample Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_id = \"42\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-11-30 00:10:34.464143: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory\n",
      "2020-11-30 00:10:34.464182: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
      "2020-11-30 00:10:37.821908: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory\n",
      "2020-11-30 00:10:37.827935: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)\n",
      "2020-11-30 00:10:37.828553: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (datascience-1-0-ml-t3-medium-1abf3407f667f989be9d86559395): /proc/driver/nvidia/version does not exist\n",
      "2020-11-30 00:10:37.832168: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA\n",
      "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2020-11-30 00:10:37.867108: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz\n",
      "2020-11-30 00:10:37.867888: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5570aa95ca20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
      "2020-11-30 00:10:37.867922: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version\n",
      "WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py:444: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.\n",
      "INFO:tensorflow:Restoring parameters from ./model/tensorflow/saved_model/0/variables/variables\n",
      "Result for output key output_1:\n",
      "[[0.01571999 0.01323595 0.012932   0.01239091 0.01233906 0.01191704\n",
      "  0.01171806 0.01165831 0.01142757 0.01099861]]\n",
      "Result for output key output_2:\n",
      "[[b'Twilight (1998)' b'Mortal Kombat: Annihilation (1997)'\n",
      "  b'Mary Reilly (1996)' b'Stag (1997)' b'Pushing Hands (1992)'\n",
      "  b'American Strays (1996)' b'Flesh and Bone (1993)' b'Carrie (1976)'\n",
      "  b'Forget Paris (1995)' b'So I Married an Axe Murderer (1993)']]\n"
     ]
    }
   ],
   "source": [
    "!saved_model_cli run --input_exprs 'input_1=np.array([\"$user_id\"])' --tag_set serve --signature_def serving_default --dir ./model/tensorflow/saved_model/0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Show the Experiment Tracking Lineage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1, 48)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sagemaker.analytics import ExperimentAnalytics\n",
    "\n",
    "lineage_table = ExperimentAnalytics(\n",
    "    sagemaker_session=sess,\n",
    "    experiment_name=recommender_experiment_name,\n",
    "    metric_names=[\n",
    "        'root_mean_squared_error',\n",
    "        'top_10_categorical_accuracy',\n",
    "        'top_50_categorical_accuracy',\n",
    "        'top_100_categorical_accuracy'\n",
    "    ],\n",
    "    sort_by=\"CreationTime\",\n",
    "    sort_order=\"Ascending\",\n",
    ")\n",
    "\n",
    "lineage_df = lineage_table.dataframe()\n",
    "lineage_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['TrialComponentName', 'DisplayName', 'SourceArn', 'SageMaker.ImageUri',\n",
       "       'SageMaker.InstanceCount', 'SageMaker.InstanceType',\n",
       "       'SageMaker.VolumeSizeInGB', 'dataset_variant', 'embedding_dimension',\n",
       "       'enable_tensorboard', 'epochs', 'learning_rate', 'model_dir',\n",
       "       'sagemaker_container_log_level', 'sagemaker_job_name',\n",
       "       'sagemaker_program', 'sagemaker_region', 'sagemaker_submit_directory',\n",
       "       'top_100_categorical_accuracy - Min',\n",
       "       'top_100_categorical_accuracy - Max',\n",
       "       'top_100_categorical_accuracy - Avg',\n",
       "       'top_100_categorical_accuracy - StdDev',\n",
       "       'top_100_categorical_accuracy - Last',\n",
       "       'top_100_categorical_accuracy - Count', 'root_mean_squared_error - Min',\n",
       "       'root_mean_squared_error - Max', 'root_mean_squared_error - Avg',\n",
       "       'root_mean_squared_error - StdDev', 'root_mean_squared_error - Last',\n",
       "       'root_mean_squared_error - Count', 'top_10_categorical_accuracy - Min',\n",
       "       'top_10_categorical_accuracy - Max',\n",
       "       'top_10_categorical_accuracy - Avg',\n",
       "       'top_10_categorical_accuracy - StdDev',\n",
       "       'top_10_categorical_accuracy - Last',\n",
       "       'top_10_categorical_accuracy - Count',\n",
       "       'top_50_categorical_accuracy - Min',\n",
       "       'top_50_categorical_accuracy - Max',\n",
       "       'top_50_categorical_accuracy - Avg',\n",
       "       'top_50_categorical_accuracy - StdDev',\n",
       "       'top_50_categorical_accuracy - Last',\n",
       "       'top_50_categorical_accuracy - Count', 'train - MediaType',\n",
       "       'train - Value', 'SageMaker.ModelArtifact - MediaType',\n",
       "       'SageMaker.ModelArtifact - Value', 'Trials', 'Experiments'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lineage_df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TrialComponentName</th>\n",
       "      <th>DisplayName</th>\n",
       "      <th>SourceArn</th>\n",
       "      <th>SageMaker.ImageUri</th>\n",
       "      <th>SageMaker.InstanceCount</th>\n",
       "      <th>SageMaker.InstanceType</th>\n",
       "      <th>SageMaker.VolumeSizeInGB</th>\n",
       "      <th>dataset_variant</th>\n",
       "      <th>embedding_dimension</th>\n",
       "      <th>enable_tensorboard</th>\n",
       "      <th>...</th>\n",
       "      <th>top_50_categorical_accuracy - Avg</th>\n",
       "      <th>top_50_categorical_accuracy - StdDev</th>\n",
       "      <th>top_50_categorical_accuracy - Last</th>\n",
       "      <th>top_50_categorical_accuracy - Count</th>\n",
       "      <th>train - MediaType</th>\n",
       "      <th>train - Value</th>\n",
       "      <th>SageMaker.ModelArtifact - MediaType</th>\n",
       "      <th>SageMaker.ModelArtifact - Value</th>\n",
       "      <th>Trials</th>\n",
       "      <th>Experiments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>tensorflow-training-2020-11-30-00-02-47-224-aw...</td>\n",
       "      <td>train</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:835319576252:train...</td>\n",
       "      <td>763104351884.dkr.ecr.us-east-1.amazonaws.com/t...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>ml.p3.2xlarge</td>\n",
       "      <td>30.0</td>\n",
       "      <td>\"100k\"</td>\n",
       "      <td>32.0</td>\n",
       "      <td>true</td>\n",
       "      <td>...</td>\n",
       "      <td>0.031139</td>\n",
       "      <td>0.003402</td>\n",
       "      <td>0.0276</td>\n",
       "      <td>33</td>\n",
       "      <td>None</td>\n",
       "      <td>s3://sagemaker-us-east-1-835319576252/tensorfl...</td>\n",
       "      <td>None</td>\n",
       "      <td>s3://sagemaker-us-east-1-835319576252/tensorfl...</td>\n",
       "      <td>[trial-1606694566-10-100k-32]</td>\n",
       "      <td>[MovieLens-Recommender-1606694566]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 48 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                  TrialComponentName DisplayName  \\\n",
       "0  tensorflow-training-2020-11-30-00-02-47-224-aw...       train   \n",
       "\n",
       "                                           SourceArn  \\\n",
       "0  arn:aws:sagemaker:us-east-1:835319576252:train...   \n",
       "\n",
       "                                  SageMaker.ImageUri  SageMaker.InstanceCount  \\\n",
       "0  763104351884.dkr.ecr.us-east-1.amazonaws.com/t...                      1.0   \n",
       "\n",
       "  SageMaker.InstanceType  SageMaker.VolumeSizeInGB dataset_variant  \\\n",
       "0          ml.p3.2xlarge                      30.0          \"100k\"   \n",
       "\n",
       "   embedding_dimension enable_tensorboard  ...  \\\n",
       "0                 32.0               true  ...   \n",
       "\n",
       "   top_50_categorical_accuracy - Avg  top_50_categorical_accuracy - StdDev  \\\n",
       "0                           0.031139                              0.003402   \n",
       "\n",
       "  top_50_categorical_accuracy - Last  top_50_categorical_accuracy - Count  \\\n",
       "0                             0.0276                                   33   \n",
       "\n",
       "  train - MediaType                                      train - Value  \\\n",
       "0              None  s3://sagemaker-us-east-1-835319576252/tensorfl...   \n",
       "\n",
       "  SageMaker.ModelArtifact - MediaType  \\\n",
       "0                                None   \n",
       "\n",
       "                     SageMaker.ModelArtifact - Value  \\\n",
       "0  s3://sagemaker-us-east-1-835319576252/tensorfl...   \n",
       "\n",
       "                          Trials                         Experiments  \n",
       "0  [trial-1606694566-10-100k-32]  [MovieLens-Recommender-1606694566]  \n",
       "\n",
       "[1 rows x 48 columns]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lineage_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'TrialComponentName': 'tensorflow-training-2020-11-30-00-02-47-224-aws-training-job',\n",
       " 'TrialComponentArn': 'arn:aws:sagemaker:us-east-1:835319576252:experiment-trial-component/tensorflow-training-2020-11-30-00-02-47-224-aws-training-job',\n",
       " 'DisplayName': 'train',\n",
       " 'Source': {'SourceArn': 'arn:aws:sagemaker:us-east-1:835319576252:training-job/tensorflow-training-2020-11-30-00-02-47-224',\n",
       "  'SourceType': 'SageMakerTrainingJob'},\n",
       " 'Status': {'PrimaryStatus': 'Completed',\n",
       "  'Message': 'Status: Completed, secondary status: Completed, failure reason: .'},\n",
       " 'StartTime': datetime.datetime(2020, 11, 30, 0, 5, 7, tzinfo=tzlocal()),\n",
       " 'EndTime': datetime.datetime(2020, 11, 30, 0, 10, 20, tzinfo=tzlocal()),\n",
       " 'CreationTime': datetime.datetime(2020, 11, 30, 0, 2, 49, 541000, tzinfo=tzlocal()),\n",
       " 'CreatedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:835319576252:user-profile/d-dsxoghy6ztwy/default-1602368497083',\n",
       "  'UserProfileName': 'default-1602368497083',\n",
       "  'DomainId': 'd-dsxoghy6ztwy'},\n",
       " 'LastModifiedTime': datetime.datetime(2020, 11, 30, 0, 10, 20, 460000, tzinfo=tzlocal()),\n",
       " 'LastModifiedBy': {},\n",
       " 'Parameters': {'SageMaker.ImageUri': {'StringValue': '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.3.0-gpu-py37'},\n",
       "  'SageMaker.InstanceCount': {'NumberValue': 1.0},\n",
       "  'SageMaker.InstanceType': {'StringValue': 'ml.p3.2xlarge'},\n",
       "  'SageMaker.VolumeSizeInGB': {'NumberValue': 30.0},\n",
       "  'dataset_variant': {'StringValue': '\"100k\"'},\n",
       "  'embedding_dimension': {'StringValue': '32', 'NumberValue': 32.0},\n",
       "  'enable_tensorboard': {'StringValue': 'true'},\n",
       "  'epochs': {'StringValue': '10', 'NumberValue': 10.0},\n",
       "  'learning_rate': {'StringValue': '3e-05', 'NumberValue': 3e-05},\n",
       "  'model_dir': {'StringValue': '\"s3://sagemaker-us-east-1-835319576252/tensorflow-training-2020-11-30-00-02-47-224/model\"'},\n",
       "  'sagemaker_container_log_level': {'StringValue': '20', 'NumberValue': 20.0},\n",
       "  'sagemaker_job_name': {'StringValue': '\"tensorflow-training-2020-11-30-00-02-47-224\"'},\n",
       "  'sagemaker_program': {'StringValue': '\"train_multitask.py\"'},\n",
       "  'sagemaker_region': {'StringValue': '\"us-east-1\"'},\n",
       "  'sagemaker_submit_directory': {'StringValue': '\"s3://sagemaker-us-east-1-835319576252/tensorflow-training-2020-11-30-00-02-47-224/source/sourcedir.tar.gz\"'}},\n",
       " 'InputArtifacts': {'train': {'Value': 's3://sagemaker-us-east-1-835319576252/tensorflow_datasets/train/'}},\n",
       " 'OutputArtifacts': {'SageMaker.ModelArtifact': {'Value': 's3://sagemaker-us-east-1-835319576252/tensorflow-training-2020-11-30-00-02-47-224/output/model.tar.gz'}},\n",
       " 'Metrics': [{'MetricName': 'top_100_categorical_accuracy',\n",
       "   'SourceArn': 'arn:aws:sagemaker:us-east-1:835319576252:training-job/tensorflow-training-2020-11-30-00-02-47-224',\n",
       "   'TimeStamp': datetime.datetime(2020, 11, 30, 0, 10, 6, 83000, tzinfo=tzlocal()),\n",
       "   'Max': 0.0652,\n",
       "   'Min': 0.0552,\n",
       "   'Last': 0.0552,\n",
       "   'Count': 33,\n",
       "   'Avg': 0.05967575757575757,\n",
       "   'StdDev': 0.003646576331217262},\n",
       "  {'MetricName': 'root_mean_squared_error',\n",
       "   'SourceArn': 'arn:aws:sagemaker:us-east-1:835319576252:training-job/tensorflow-training-2020-11-30-00-02-47-224',\n",
       "   'TimeStamp': datetime.datetime(2020, 11, 30, 0, 10, 6, 83000, tzinfo=tzlocal()),\n",
       "   'Max': 3.7151,\n",
       "   'Min': 3.6636,\n",
       "   'Last': 3.6715,\n",
       "   'Count': 33,\n",
       "   'Avg': 3.6911363636363634,\n",
       "   'StdDev': 0.01796099625197993},\n",
       "  {'MetricName': 'top_10_categorical_accuracy',\n",
       "   'SourceArn': 'arn:aws:sagemaker:us-east-1:835319576252:training-job/tensorflow-training-2020-11-30-00-02-47-224',\n",
       "   'TimeStamp': datetime.datetime(2020, 11, 30, 0, 10, 6, 83000, tzinfo=tzlocal()),\n",
       "   'Max': 0.0079,\n",
       "   'Min': 0.0048,\n",
       "   'Last': 0.0049,\n",
       "   'Count': 33,\n",
       "   'Avg': 0.006193939393939393,\n",
       "   'StdDev': 0.001173919554829939},\n",
       "  {'MetricName': 'top_50_categorical_accuracy',\n",
       "   'SourceArn': 'arn:aws:sagemaker:us-east-1:835319576252:training-job/tensorflow-training-2020-11-30-00-02-47-224',\n",
       "   'TimeStamp': datetime.datetime(2020, 11, 30, 0, 10, 6, 83000, tzinfo=tzlocal()),\n",
       "   'Max': 0.0361,\n",
       "   'Min': 0.0276,\n",
       "   'Last': 0.0276,\n",
       "   'Count': 33,\n",
       "   'Avg': 0.03113939393939394,\n",
       "   'StdDev': 0.003401832171229515}],\n",
       " 'ResponseMetadata': {'RequestId': 'bc3e444e-5a8f-4bbb-9535-6ede946933d5',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': 'bc3e444e-5a8f-4bbb-9535-6ede946933d5',\n",
       "   'content-type': 'application/x-amz-json-1.1',\n",
       "   'content-length': '3404',\n",
       "   'date': 'Mon, 30 Nov 2020 00:10:38 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sm.describe_trial_component(TrialComponentName=lineage_df.TrialComponentName[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pass Variables to the Next Notebook(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stored 'recommender_multitask_training_job_name' (str)\n"
     ]
    }
   ],
   "source": [
    "%store recommender_multitask_training_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
